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Abstract 

Background: Biomolecular pathways and networks are dynamic and complex, and the perturbations to them 
which cause disease are often multiple, heterogeneous and contingent. Pathway and network visualizations, 
rendered on a computer or published on paper, however, tend to be static, lacking in detail, and ill-equipped to 
explore the variety and quantities of data available today, and the complex causes we seek to understand. 

Results: RCytoscape integrates R (an open-ended programming environment rich in statistical power and data- 
handling facilities) and Cytoscape (powerful network visualization and analysis software). RCytoscape extends 
Cytoscape's functionality beyond what is possible with the Cytoscape graphical user interface. To illustrate the 
power of RCytoscape, a portion of the Glioblastoma multiforme (GBM) data set from the Cancer Genome Atlas 
(TCGA) is examined. Network visualization reveals previously unreported patterns in the data suggesting 
heterogeneous signaling mechanisms active in GBM Proneural tumors, with possible clinical relevance. 

Conclusions: Progress in bioinformatics and computational biology depends upon exploratory and confirmatory data 
analysis, upon inference, and upon modeling. These activities will eventually permit the prediction and control of complex 
biological systems. Network visualizations - molecular maps - created from an open-ended programming environment 
rich in statistical power and data-handling facilities, such as RCytoscape, will play an essential role in this progression. 
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Background 

Molecular biology has made great progress in recent years 
by measuring the abundance and characteristics of many 
kinds of molecules, often at a global level Whole genomes 
have been sequenced, global mRNA and miRNA levels 
assessed, protein expression measured, phosphorylation 
and methylation states assayed. Many protein structures 
have been determined. Progress towards understanding 
the dynamic relations and interactions among these 
molecular components, however, has lagged significantly 
[1]. It is precisely these complex system behaviors which 
must be understood in order to comprehensively predict 
and control cellular processes in health and disease. 

Causal explanations in molecular biology of sufficient 
depth and completeness to explain disease, and to create 
the basis for successful therapy, are almost never simple. 
Cancer, for example, is currently understood to consist 
of six separate processes, or "hallmarks", each of which 
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is controlled by redundant and overlapping pathways 
[2]. Even classic single gene disorders show variable age 
of onset and severity, apparently due to the influence of 
modifier genes [3]. A recent theoretical framework 
establishes that the control of gene regulatory networks 
requires prior control of more than half the constituent 
nodes [4]. Phosphorylation networks exhibit similar 
complexity and resistance to manipulation [5]. As we 
explore and map this complex terrain, using ever larger 
amounts of heterogeneous (and often noisy) data, net- 
work visualization tools integrated within a statistically 
powerful programming environment will prove indis- 
pensable. RCytoscape provides one such set of tools. 

Many and diverse kinds of software will be needed in 
order to achieve prediction and control of cellular 
processes. We distinguish two broad classes on the basis of 
novelty. Software for routine bioinformatics, in which well- 
studied algorithms are applied to well-understood kinds of 
data, can be distinguished from software required for novel 
bioinformatics and computational biology, in which the 
data are often less well understood, and for which new al- 
gorithms must be developed. Routine bioinformatics is 
often accomplished with web-based and point-and-click 
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desktop applications. Gene Set Enrichment Analysis 
(GSEA), a Java application offered by the Broad Institute 
calculates over-representation in curated gene data sets for 
an experimenters expression data: one loads mRNA ex- 
pression data from a file, chooses the gene set categories of 
interest, obtains a list of enriched categories. No opportun- 
ity is provided to filter the input data, to transform it in 
possibly revealing ways, to correlate with related data, to 
display in the context of known gene and protein interac- 
tions, to apply experimental algorithms before and after the 
enrichment step. Novel bioinformatics and computational 
biology, however, require a programming language (or lan- 
guages). They depend upon robust and full-featured statis- 
tical and modeling libraries, easy access to many kinds of 
data and annotation, and strong visualization capabilities, 
harnessed together into a programming environment for ex- 
ploration, modeling and analysis. 

In a recent review of network display software incorpor- 
ating high throughput molecular biology data [6] 
Gehlenborg et al. conclude that "truly integrated 
visualization of systems biology data across the entire 
range of possible data types is still very much in its in- 
fancy." With one notable exception, the software included 
in this review are point-and-click web or desktop applica- 
tions. (The exception is a small set of multivariate gene 
expression analyses and visualizations accomplished with 
R, whose special capabilities and utility we discuss below). 
The effective integration and visualization of large quan- 
tities of multiple kinds of data requires frequent recourse 
to statistical programming and exploratory visualization. 
Normalization techniques, for example, must be chosen 
and applied with caution, iteratively and provisionally, with 
frequent recourse to visual assessment when data from di- 
verse sources are combined. For the foreseeable future 
these integration and visualization activities will not be 
routine, will require programming and interactive engage- 
ment with the data, and will thus be beyond the capabilities 
of point-and-click software performing routinized analyses. 

In addition to the above classification of software by 
novelty, we also distinguish, on another axis, different 
types of bioinformatic and computational biology activ- 
ities. Drawing in part upon an NIH classification [7] 
these categories include: exploratory data analysis, con- 
firmatory data analysis (including hypothesis testing), 
statistical inference (including clustering and classifica- 
tion), mathematical modeling, and simulation. Listed in 
an ascending order of complexity and rigor, and sorted 
into what might be early, medium and late stage bio- 
informatic or computational research activities, in fact 
these activities are interleaved and repeated, in an im- 
provised manner, throughout such research projects. 

Exploratory Data Analysis (EDA) [8] and information 
graphics [9] are two related disciplines associated, 
respectively, with statistician John Tukey and data 



visualization pioneer Edward Tufte. Together they pro- 
vide the rationale for high quality network visualization, 
and describe the role it can play in novel bioinformatics 
and computational biology. Their combined claim may 
be summarized thus: that the judicious display and ex- 
ploration of data contributes insight into the data and 
into possible causal relationships which may otherwise 
be missed. John Tukey: "[Visualization methods] ... are 
there, not as a technique, but rather as recognition that 
the picture-examining eye is the best finder we have of 
the wholly unanticipated" [10]. As a proponent of good 
techniques for visualizations, Tufte has said, "There is 
no such thing as information overload, just bad design. 
If something is cluttered and/or confusing, fix your 
design" [11]. Integrating a powerful scripting language 
with network visualization software, as we argue for here, 
empowers Tukeys picture-examining eye, and makes 
possible Tufte s injunction to "fix your [visual] design". 

Thus the goal is to improve network visualization: to in- 
crease the ease and sophistication with which detailed mo- 
lecular maps can be constructed, in order that they may 
contribute to all of the novel bioinformatic and computa- 
tional biology, network-related activities listed above. 

Software 

Cytoscape, which we first released in 2002 [12], has be- 
come the standard open source network visualization soft- 
ware used in molecular biology [13]. Over the years it has 
been refined, extended, and has attracted a large number 
of users and developers. The core data type is a network 
(a mathematical graph, or multigraph) having nodes and 
edges, and accompanied by any number of data attributes 
on those nodes and edges. Cytoscapes s "vizmapper" trans- 
lates node and edge data attributes into visible attributes 
(from gene expression to node color, for instance). A 
plugin architecture allows for extensions to the core code. 
Over one hundred plugins are available, providing access 
to many bioinformatic resources and analyses. 

Cytoscape, despite its many strengths, has not been well- 
suited to novel bioinformatics and computational biology, 
because it has lacked a full-featured, bioinformatically- 
capable scripting language. Three are three candidate open- 
source scripting languages in common use today: Perl, 
Python, and R. Cytoscape s ScriptingPluginEngine only sup- 
ports languages implemented to run in the Java Virtual 
Machine, thus ruling out Perl, R and anything but a limited 
version of Python ("Jython"). Ruby, JavaScript, Groovy, 
Clojure, and the aforementioned Jython are supported, but 
these languages are little used in and somewhat limited for 
bioinformatics. Jython, for instance, is unable to run 
NumPy, "the fundamental package for scientific computing 
with Python" [14] or to call out to other compiled code. 

Lacking a full-featured, bioinformatically-capable scripting 
language, the most effective way to extend the capabilities of 
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Cytoscape beyond those offered by the Cytoscape core 
development team, and by Cytoscape plugin writers, has 
been to write another plugin. Unfortunately, this is a task 
for a seasoned Java software developer, and largely impracti- 
cal and/or inefficient for research bioinformaticians and 
computational biologists, for two reasons. First, Java pro- 
gramming is complex and time-consuming: object inherit- 
ance must be understood, Java classes designed and 
implemented, the Cytoscape API grasped, the code com- 
piled, classpaths resolved, and jar files dynamically loaded. 
Second, the new plugin will be subject to the same con- 
straints and lack of flexibility found in all plugins and in 
Cytoscape itself: only pre-conceived operations which can 
be accomplished via a point-and-click interface are per- 
mitted, with parameterization limited to the provided 
GUI, and with the reproducibility of any analysis reduced 
to recording and repeating a precise series of mouse clicks. 
Cytoscape, in the absence of a strong bioinformatics 
scripting language, can be useful in routine bioinformatics, 
but is not well-suited for the practice of novel bioinfor- 
matics and computational biology. 

With the appearance of the CytoscapeRPC plugin in 2011 
[15] full scripting finally became possible. CytoscapeRPC 
employs XMLRPC [16], a well-known HTTP-based inter- 
process communication protocol which is supported by the 
three top scripting language candidates, Perl, Python and R. 
For the first time, most of Cytoscape s point-and-click com- 
mands, and many other internal operations, could be ac- 
complished by function calls from a scripting language, 
running in another process. Data and networks residing in 
Cytoscape can in addition, be transmitted to the environ- 
ment of the scripting language. Thus augmented, Cytoscape 
becomes well-suited to novel bioinformatics and computa- 
tional biology. The first version of RCytoscape, using 
CytoscapeRPC, appeared in the fall of 2011. 

To make these capabilities convenient to use, and to in- 
sulate RCytoscapes programmer interface from changes 
to Cytoscape API (about which more below, in the discus- 
sion of the recent release of Cytoscape 3.0) one task 
remained: to create an R programing interface to 
Cytoscape so that the details of the XMLRPC protocol are 
hidden, and so that Cytoscape commands appears as "nat- 
ural" operations in the scripting language. Thus, instead of 



We chose R from among the three candidate lan- 
guages. Perl was rejected because it is not interactive, 
and because its popularity within bioinformatics appears to 
have declined in recent years, perhaps due a perceived defi- 
ciency in statistical and modeling domains, for which 
bioinformaticians and computational biologists usually turn 
to Python, R, or a compiled language such as C or C++. 

Python and R are both strong candidates for scripting 
Cytoscape. Both work in interactive and 'batch' modes, 
and both are used in a wide variety of scientific and 
engineering domains. They each support an interface to 
compiled languages (C, C++, Fortran) providing access 
to extra speed or special libraries as needed. 

We selected R over Python. As the open source 
implementation of the S programming language, R 
has been shaped by more than 30 years of use in sta- 
tistics, data mining, and numerical modeling. Though 
R has neither the syntactic elegance of Python, nor 
the dynamic power of Ruby, its long history, and its 
broad adoption as the standard open source "software 
environment for statistical computing and graphics" 
[17] means that it offers unparalleled convenience 
and power for analyzing data. Python, while generally 
acknowledged to be the better-designed language, has 
always lagged R in features needed by those doing 
serious statistics, data-mining, and other data- and 
mathematically-oriented computing. If mailing list 
volume is a reliable indicator, the R/Bioconductor 
bioinformatics community is substantially more active 
than Biopython [18]. Among data-mining and statis- 
tical programming languages, R in recent years has 
become the most popular [19]. 

Many basic and sophisticated statistical functions are 
built into the language itself ('base R'); these often em- 
body the state of the art in these algorithms. Their easy 
availability in the language has for many years attracted 
those wanting to create new capabilities and analyses. 
This positive feedback loop continues to this day: 
existing mathematical and visualization capabilities 
attract additional capabilities because those new cap- 
abilities are easier to create in R than in other lan- 
guages. For those working in the field of network 
inference, in statistically robust analysis of high through- 



xmlrpc ( " http : //localhost : 9000 " , "Cytoscape . selectNodes " , windowID , 

nodeNames) 

RCytoscape scripters can write 

selectNodes (window, nodeNames) 
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put and next-generation sequencing data, R's mathem- 
atical and visualization strengths confer a substantial 
advantage. An integration of Python and Cytoscape 
will be useful, welcome and popular. However, the 
statistical, data-mining, visualization and bioinformatic 
strengths and popularity of R, and the Bioconductor 
Project, led us to choose R as our scripting language, 
and to create a new Bioconductor R package, 
"RCytoscape" [20]. 

Another strength of R over Python - if only by a mat- 
ter of degree - is software support for, and wide adoption 
of the practice of - "reproducible research" [21]. R has 
evolved an integrated set of coding and data distribution 
practices, embodied in the standard R "package" struc- 
ture, expressly designed to support reproducible re- 
search The R package is a standardized collection of 
directories and files, including R source code; C, C++ 
and Fortran source code when applicable; documenta- 
tion files with executable demonstration code snippets; 
any required data; unit tests to establish the reliability 
of the code; and not least, a "vignette". The latter is a 
text document which implements Donald Knuths "lit- 
erate programming" [22]: R code is interspersed with 
narrative text which explains the logic and details of 
the analysis undertaken. Whenever the package is built 
or tested, the code is run, and a pdf version of the vi- 
gnette is created. A well-written vignette and its pack- 
age thus convey everything an interested reader needs 
in order to reproduce an analysis. It thus sets the stage 
for reliably disseminating data, software, and results. 
We include such a package and vignette to complement 
the necessarily less detailed treatment of glioblastoma 
proneural tumors presented in the Discussion section 
below. 

As Gehlenborg et al. [2] emphasize, the assimilation 
and integration of diverse data types into network 
analysis and visualization is an urgent task. With new 
types of molecular measurement data appearing fre- 
quently — as seen in a large public release from the 
ENCODE project [23] in 2012 — an optimal environ- 
ment will have easy access to the data and tools for 
analyzing it. The open-source R-based Bioconductor 
[24] project, along with BioPython and BioPerl, have 
a long history of providing routine access to such 
data and tools. 

In the case study we present below, our focus will be 
upon data visualization in the context of exploratory 
data analysis to demonstrate the cartographic capabilities 
of RCytoscape. Heterogeneous experimental data is mar- 
shaled and filtered in R, molecular pathways assembled, 
a sequence of interactive maps displayed in Cytoscape, 
leading to the identification of molecular interaction pat- 
terns missed by the clustering analysis upon which the 
case study builds. 



Implementation 

RCytoscape connects R and Cytoscape using XMLRPC, a 
standard WWW protocol for transmitting messages and 
data between programs. It is implemented in the 
'CytoscapeRPC plugin and by the complementary open 
source RCytoscape package, written in R and freely avail- 
able through the Bioconductor web site. Nearly all of 
Cytoscapes operations appear as function calls in R. Net- 
works can be assembled from data marshaled in R from 
public or proprietary network databases and then displayed 
in Cytoscape. Conversely, data and networks loaded into 
Cytoscape using native Cytoscape methods may be 
imported into an R session via a simple function call. Net- 
work layout and visual mapping for nodes and edges can 
be specified and applied. Zooming and selection, filtering 
based on attributes, animation (changing node positions, 
changing node and edge colors, shapes and sizes) all be- 
come possible. R is a complete, interpreted and batch- 
oriented programming environment, so network creation, 
manipulation, exploration and analysis can proceed one 
command at a time, or be combined into scripts and pro- 
grams, encouraging reproducible analysis, and thus repro- 
ducible research. By contrast, recreating visualizations in 
Cytoscape, and any analyses performed through the trad- 
itional Cytoscape combination of interactive commands 
and plugins, can only be accomplished by recapitulating a 
precise sequence of mouse clicks, which is error-prone and 
which restricts the dissemination of useful analyses. 

Another virtue of scripting control of Cytoscape is the 
ease which with animations (dynamic displays) can be 
made. When a sequence of Cytoscape maps are 
displayed, or when a single map is modified by scripted 
commands, an extra dimension of information is added. 
This extra dimension may be time, based upon time- 
series experimental data, but it can equally well, for 
example, be cell-type, or treatment. The underlying 
scripting and display techniques are the same: several 
frames are shown in sequence, and optionally saved as 
replayable images (see Additional files 1, 2 and 3). 

RCytoscape provides three methods for distributing 
(publishing') biomolecular maps, static or dynamic, each 
tailored to a different audience. Simple browser publish- 
ing is supported by exporting dynamic network images 
from RCytoscape into a web page. For experienced 
Cytoscape users, networks created with RCytoscape may 
be saved and shared as Cytoscape session files. In labora- 
tory settings where some programming skill is available, 
and particularly where cross-laboratory collaborations 
are involved, code and data can be combined into easily 
shared R packages so that networks, experimental data, 
algorithms and visualization become live documents' 
shared and evolving over time (the case study, below, is 
provided as an R package; Additional files 4 and 5 dem- 
onstrate this valuable scientific practice). 
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With the release of Cytoscape 3.0 in February of 
2013, a new application architecture is available. With 
version 3.1, expected in late 2013, built-in support for 
a "RESTful" [25] inter-process communication proto- 
col is planned, coupled to the new "TaskFactory" API. 
This will provide the basis for the next version of 
RCytoscape. A native Python scripting interface will 
very likely appear then as well. However, even with 
the availability of native Python scripting for Cyto- 
scape, R and RCytoscape will continue to offer unique 
capabilities, power and convenience due to R's long 
history, and its current capabilities and popularity for 
doing bioinformatics and computational biology. 

Results and discussion 

We demonstrate the synergistic capabilities of 
RCytoscape with a small biological case study. (An 
extended version of this case study will be found in 
Additional file 6). Verhaak et al. (2010) used consensus 
average linkage clustering of gene expression and gen- 
omic abnormalities to distinguish four subclasses of 
Glioblastoma multiforme (GBM) tumors. Interestingly, 
these data-derived subclasses correspond to the neural 
lineage of the tumor cells and, to a limited extent, to 
differing treatment strategies. We restrict our attention 
to the least treatable of these four subclass: fifty-five 
tumors in the "Proneural" class. From these we select 
thirteen which best match the signature proposed by 
Verhaak et al. for this subclass: high expression and 
amplification of PDGFRA. We reason that if the path- 
way neighbors of PDGFRA display heterogeneity in 
mRNA expression, copy number and mutations, despite 
PDGFRA consistency, that this heterogeneity may have 



implications for tumor treatment. Combined visual and 
simple statistical exploratory analysis does indeed reveal 
such heterogeneity. 

Complete code and data to reproduce the analysis de- 
scribed here is provided in the supplemental R package 
"ProneuralHeterogeneity" (Additional file 4). For narra- 
tive clarity and to economize on space, we present a typ- 
ical RCytoscape session in "pseudocode", leaving out the 
many fine-grained coding details. In many cases, the 
pseudocode show here differs little from calls to 
functions provided in the supplemental Proneu- 
ralHeterogeneity package. 

Data retrieval 

We use two R packages to retrieve selected KEGG 
pathways and to obtain high throughput assay data 
from TCGA. These packages deliver R data objects to 
the RCytoscape session via a few function calls, 
returning standard data objects which are ready for 
further computation. These simple steps contrast with 
traditional bioinformatic data retrieval, usually accom- 
plished via web browsing and ftp, and followed by 
reformatting of the downloaded data before it is in a 
computable form. 

We retrieved and combined three pathways from the 
many curated by KEGG: "Pathways in Cancer", "Cell 
Cycle" and "Glioma", eliminating redundant reactions, 
and creating a preliminary single network of molecular 
interactions with which to illuminate the TCGA data. 
This combined network consists primarily of signaling 
and regulatory relationships, and provides good coverage 
of the Proneural tumor signature gene (PDGFRA) and 
the signaling pathways in which it is involved. 



library ( KEGGgraph ) 

pathway .names <- c( "glioma" , "cellCycle", "pathwaysInCancer" ) 

tbl. pathways <- combineKeggPathways (pathway . names , organism="hsa" ) 



Use the CGDSR package to etrieve expression and gen- 
omic variation data for the genes in the combined 
KEGG pathway. 



library ( cgdsr ) 

genelDs <- genesInPathways ( tbl . pathways ) 
disease <- "glioblastoma multiforme" 

cgds. server <- CGDS ( " http: //www. cbioportal . org/public-portal/ " ) 
tbl.mrna <- getTCGAdata ( disease , "mRNA" , genelDs) 
tbl.mut <- getTCGAdata (disease, "mutation", genelDs) 
tbl.cnv <- getTCGAdata (disease, "copyNumber" , genelDs) 



Shannon et al. BMC Bioinformatics 2013, 14:217 
http://www.biomedcentral.com/1471 -21 05/1 4/21 7 



Page 6 of 9 



These three TCGA data tables created above contain 
one row for each genelD, and one column for each 
tumor sample in the TCGA repository. We are inter- 
ested here only in those categorized as "Proneural" and 
more specifically, only those which match or exceed the 
signature described in Verhaak et al: log fold change 



mRNA expression of the receptor tyrosine kinase 
PDGFRA greater than 2.0 (compared to reference) and 
PDGFRA copy number classified as "amplified". We call 
these signature-consistent tumors "strong Proneural", 
and created a function to identify them whose details are 
elided here: 



strongProneuralTumors <- identif yTumors ( " strongProneural" ) 



Visualization begins by examining three TCGA-derived 
assays (expression, mutations, copy number) of each of 
the thirteen strong proneural tumors, projecting that data 
onto the Cytoscape display of the combined KEGG net- 
work. Since the network is large, individual details may be 
missed, but the broad characteristics, and heterogeneity of 
the tumors is easily seen. "Visual mapping" rules specify 
how node color, size and shape are to reflect the assay 
data. Each tumor is visualized in turn, and the result (an 
animated image file, "twoTumorsPDGFRAneighborhood. 
gif") can be found in Additional file 1. 



Programmatic visual mapping 

Visual mapping rules are traditionally specified in 
Cytoscape via a graphical user interface (called the 
"vizmapper") but are arguably far easier to handle with R 
commands. We learn the distribution of the mRNA log- 
fold expression, first numerically (expressed as min, max, 
median, first and third quartile), then visually, with a 
histogram. We use this knowledge to parameterize node 
color mapping. 



cyWindow <- display-Network (tbl .pathways ) 

fivenum(tbl.mrna) # -5.82520 -0.63440 -0.01585 0.59170 13.24360 
hist (tbl .mrna) 

dataHingePoints <- c (-6.0, 0.0, 6.0, 14) 
colorSpectrum <- c( "green", "white", "red", "black") 
setNodeColorRule (cyWin, "mRNA", dataHingePoints, colorSpectrum) 



Similarly, the distribution of copy number variation is 
mapped to node border color and thickness, conveying 
deletion and amplification. Amino acid substitutions are 
added to node labels ("PDGFRA C235Y") for the tumors 
in which they appear. We use node shape to indicate 
molecule type (ligand, receptor kinase, transcription 



factor, etc.). Finally, we devise a vizmapping rule for 
node size which uses an informal estimate of overall 
tumorigenic signifcance of each gene so that, for in- 
stance, a highly expressed, amplified gene with a non- 
synonymous mutation is rendered as a node with a large 
diameter. 



tumorigenicScore <- abs(mrna) + cnv + 

( 3 * onSynomymousMutationStatus ) 
setNodeAttribute ( cyWindow, "score", tumorigenicScore) 
nodeSizes <- c (20, 160) 
tumor igenicHingePoints <- c(0, 20) 

setNodeSizeRule ( cyWindow, "score", tumorigenicHingePoints , 
nodeSizes, mode="interpolate" ) 



Network layout can be a complex problem. Automated 
algorithms, of which Cytoscape offers a useful collection, 
have many strengths. We find that these algorithms, 
though useful, can always be improved upon as we seek 



an optimal display for large and/or complex networks. 
RCytoscape encourages a combined layout strategy: an ap- 
propriate automated Cytoscape layout algorithm is applied 
in the early development stages of a project, and is then 
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manually fine-tuned using direct placement of some 
nodes. The final layout is saved for reuse in the future via 
a single function call: 

saveLayout ( cyWindow, "proneuralKeggLayout . RData" ) 
restoreLayout ( cyWindow, "proneuralKeggLayout . RData" ) 



Any network displayed in Cytoscape can be saved as 
an image, and any succession of such displays can be 
assembled into an animation easily viewed in a web 
browser. Three such animations will be found in 
Additional file 1 (two tumors, PDGFRA neighborhood); 



Additional file 2 (thirteen tumors, large cancer-related 
network); Additional file 3 (thirteen tumors, PDGFRA 
neighborhood). Two representative images from the first 
animation (Additional file 1) are presented and discussed 
below. 



for (tumor in strongProneuralTumors ) { 

displayTumor ( cyWindow, tumor, tbl.mrna, tbl.mut, tbl.cnv) 

savelmage ( cyWindow) 

} 



To create the zoomed-in animation, select zoom in on this subnetwork before running the data 
PDGFRA, and its first two network neighbors, then animation: 



selectNodes ( cyWindow, "PDGFRA" ) 
selectFirstNeighborsOf SelectedNodes ( cyWindow) 
selectFirstNeighborsOf SelectedNodes ( cyWindow) 
fitSelection( cyWindow) 
clearSelection( cyWindow) 

for (tumor in strongProneuralTumors) { 

displayTumor (cyWindow, tumor, tbl.mrna, tbl.mut, tbl.cnv) 

s ave Image ( cyWindow ) 

} 



Receptor Tyrosine Kinases (RTKs) 

The animations created above, as well as the two static 
images (Figure 1) selected from the second sequence, re- 
veal heterogeneous expression and copy number in the 
immediate network vicinity of PDGFRA, the gene whose 
abnormalities define the Proneural subtype. PDGFRA is 
a receptor tyrosine kinase (RTK), a class of proteins 
which, when constituitively activated, contribute to un- 
restrained cell growth and tumorigenesis. A full discus- 
sion of RTKs and strong proneural tumor heterogeneity 
will be found in Additional file 6, but we here briefly 
note that although both tumors exhibit amplified and 
over-expressed PDGFRA, only tumor 385 shows the lig- 
and PDGFB to be amplified and over-expressed as well, 
a phenomenon which suggests a possible autocrine loop, 
and a more important role for PDGRFA signaling than 



in tumor 014. Tumor 014, by contrast, shows an over- 
expressed and amplified RTK ligand/receptor pair 
(FGR12 and FGFR2) which is inactive in tumor 0365. 
This contrast, which is immediately visible in the 
RCytoscape-created map, may have implications for the 
selection of therapeutic targets in the treatment of glio- 
blastoma Figure 1. 

Conclusions 

Detailed, data-driven network visualization and the 
open-ended computational power and statistical richness 
of the R programming environment can be useful at all 
stages of analysis of biological data. RCytoscape enables 
the construction of detailed molecular maps that reveal 
significant heterogeneity beyond statistically robust clas- 
sifier consensus average linkage hierarchical clustering 
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3 TCGA.02.0014 




TCGA.08.0385 




Transcriptional Regulation 
Repression 



Phosphorylation 
Dephosphorylation 
Binding/ Associa tion 
Dissociation 
Indirect Effect 




Figure 1 A static display of (a) tumor TCGA.02.0014 and (b) TCGA.0.0835. mRNA differential expression, copy number and mutation data 
mapped onto the same composite reference PDGFRA-neighborhood subnetwork (KEGG 'Pathways in Cancer', 'Glioma', 'Cell Cycle'), (c) Key: Node 
color indicates differential expression (log-fold-change with respect to normal brain tissue), (d) Edges indicate molecular relationships as 
indicated, (e) Node shape indicates functional gene type; node border color and width indicates genomic copy number. Node size indicates 
possible tumorigenic importance, a function of mRNA expression, copy number and non-synonymous mutation. Non-aberrant genes and 
relationships are dimmed out. 



obtained by Verhaak et al [16]. Differences seen in the 
RCytoscape-derived maps suggest different autocrine 
loops may be active in proneural glioblastoma tumors, 
with two complimentary gene/protein pairs in each. 
Such a classification is valuable, and can set the stage for 
the careful characterization of individual tumors and 
design of new treatment strategies. Subsequent steps 
along the road to clinically useful prediction and control 
of these phenomena will involve more rounds of 
confirmatory and exploratory data analysis, simulation 
and modeling. 

Linking R with Cytoscape has value for many different 
kinds of analyses beyond what is described here. Recently, 
Grimes et al, [26] used RCytoscape to graph nodes in two 
and three dimensions using coordinates derived from 
dimension reduction (embedding) algorithms, and plot 



networks using protein-protein interaction edges merged 
from three different databases, setting visual properties for 
nodes and edges based on quantitative information from a 
lung cancer phosphoproteomic data set. RCytoscape will 
be useful to anyone who wishes to add network visua- 
lization and analysis to the rich resources available in the 
R programming environment. 

Availability of supporting software and data 

We include an R package "ProneuralHeterogeneity" in 
Additional file 4, which includes complete data, docu- 
mentation, and unit-tested executable code with which 
to reproduce the two maps shown above. Code is also 
provided for the creation of more comprehensive 
networks, and for visualizing any number of the TCGA 
GBM tumors, along with an animation showing each of 
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the thirteen "strong proneural" tumors in a loop for 
comparison and exploration. 

Availability and requirements 

Project name: RCytoscape 

Project home page: http://bioconductor.org/packages/ 
release/bioc/html/RCytoscape.html 
Operating system(s): Platform independent 
Programming language: R 

Other requirements: R >= 2.15, Cytoscape >= 2.8.0 
License: GNU GPL-2 

Additional files 
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Additional file 1: (twoTumorsPDGFRAneighborhood.gif: an 
animated data display of the PDGFRA neighborhood of the two 
tumors discussed in the body of the article). 

Additional file 2: (thirteenTumorsFullNetwork.gif: an animated data 
display of thirteen tumors (a superset of the two discussed in the 
paper and animated in Additional file 4) in the context of a much 
larger collection of cancer-related networks). 

Additional file 3: (thirteenTumorsPDGFRAneighborhood.gif. an 
animated data display of the "strong proneural" tumors, focused in 
upon the network neighborhood of PDGFRA). 

Additional file 4: (The ProneuralHeterogeneity R package, with 
annotated executable code and data with which to reproduce a 
superset of the analysis presented in the article). 

Additional file 5: (Proneural Heterogeneity vignette). 

Additional file 6: (An extended discussion of Receptor Tyrosine 
Kinase activation in heterogeneous Proneural GBM tumors 
(an abbreviated version appears in the body of the article). 



